Machine translation in continuous space
نویسندگان
چکیده
We present a different perspective on the machine translation problem that relies upon continuous-space probabilistic models for words and phrases. Within this perspective we propose a method called Tied-Mixture Machine Translation (TMMT) that uses a trainable parametric model employing Gaussian mixture probability density functions to represent wordand phrase– pairs. In the new perspective, machine translation is treated in the same way as acoustic modeling in speech recognition. This new treatment carries several potential advantages that may improve state-of-the-art machine translation systems, including better generalization to unseen events; adaptation to new domains, languages, genres, and speakers via methods such as Maximum-Likelihood Linear Regression (MLLR); and improved discrimination through discriminative training methods such as MaximumMutual Information Estimation (MMIE). Our goal in this paper, however, is to introduce the new approach and demonstrate its viability, leaving investigation of some of the potential advantages to future work. To this end, we report some preliminary experiments demonstrating the viability of the proposed method.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Continuous Plane Model to Machine Layout Problems Considering Pick-Up and Drop-Off Points: An Evolutionary Algorithm
One of the well-known evolutionary algorithms inspired by biological evolution is genetic algorithm (GA) that is employed as a robust and global optimization tool to search for the best or near-optimal solution with the search space. In this paper, this algorithm is used to solve unequalsized machines (or intra-cell) layout problems considering pick-up and drop-off (input/output) points. Such p...
متن کاملContinuous-Space Language Models for Statistical Machine Translation
This paper describes an open-source implementation of the so-called continuous space language model and its application to statistical machine translation. The underlying idea of this approach is to attack the data sparseness problem by performing the languagemodel probability estimation in a continuous space. The projection of thewords and the probability estimation are both performed by a mul...
متن کاملLearning Continuous Phrase Representations for Translation Modeling
This paper tackles the sparsity problem in estimating phrase translation probabilities by learning continuous phrase representations, whose distributed nature enables the sharing of related phrases in their representations. A pair of source and target phrases are projected into continuous-valued vector representations in a low-dimensional latent space, where their translation score is computed ...
متن کاملSmooth Bilingual N-Gram Translation
We address the problem of smoothing translation probabilities in a bilingual N-grambased statistical machine translation system. It is proposed to project the bilingual tuples onto a continuous space and to estimate the translation probabilities in this representation. A neural network is used to perform the projection and the probability estimation. Smoothing probabilities is most important fo...
متن کاملs-Topological vector spaces
In this paper, we have dened and studied a generalized form of topological vector spaces called s-topological vector spaces. s-topological vector spaces are dened by using semi-open sets and semi-continuity in the sense of Levine. Along with other results, it is proved that every s-topological vector space is generalized homogeneous space. Every open subspace of an s-topological vector space is...
متن کامل